Factors Affecting Web Page Similarity

نویسندگان

  • Anastasios Tombros
  • Zeeshan Ali
چکیده

Tools that allow effective information organisation, access and navigation are becoming increasingly important on the Web. Similarity between web pages is a concept that is central to such tools. In this paper, we examine the effect that content and layout-related aspects of web pages have on web page similarity. We consider the textual content contained within common HTML tags, the structural layout of pages, and the query terms contained within pages. Our study shows that combinations of factors can yield more promising results than individual factors, and that different aspects of web pages affect similarities between pages in a different manner. We found a number of factors that, when taken into account, can result in effective measures of similarity between web pages. Query information in particular, proved to be important for the effective organisation of web pages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Different Strokes for Different Folks: An Analysis of Similarity and Diversity in Web Search

Relying purely on query-page similarity when ranking Web search results limits the scope of the result set to the detriment of search performance. In this paper we propose that introducing diversity into the ranking metric can increase topic coverage without adversely affecting result relevance in the face of vague queries.

متن کامل

An Efficient Approach for Near-duplicate page detection in web crawling

The drastic development of the World Wide Web in the recent times has made the concept of Web Crawling receive remarkable significance. The voluminous amounts of web documents swarming the web have posed huge challenges to the web search engines making their results less relevant to the users. The presence of duplicate and near duplicate web documents in abundance has created additional overhea...

متن کامل

Factors Affecting Information Retrieval

Web is a largest information repository containing web pages, continuously expanding with time thus effectively and efficiently searching relevant information on the web is being a challenge. In recent years large numbers of search engines have been introduced which are still dealing with few problems of indexing and retrieval of relevant information. Therefore it becomes essential to evaluate ...

متن کامل

تشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی

Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...

متن کامل

Hybrid Adaptive Educational Hypermedia ‎Recommender Accommodating User’s Learning ‎Style and Web Page Features‎

Personalized recommenders have proved to be of use as a solution to reduce the information overload ‎problem. Especially in Adaptive Hypermedia System, a recommender is the main module that delivers ‎suitable learning objects to learners. Recommenders suffer from the cold-start and the sparsity problems. ‎Furthermore, obtaining learner’s preferences is cumbersome. Most studies have only focused...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005